2 Jun 2025
Introduction to R and RStudio
Reproducible data analysis with Quarto
Organise your work with R Projects
R data objects
Every version of R that is released is named after a topic in a Peanuts comic. The R version 4.3.3 (2024-02-29) is called “Angel Food Cake”.
| R version | Charlie Brown cartoon |
|---|---|
When you start RStudio and R only the base packages are activated: the basic installation with basic functionality.
There are almost 20.000 packages that have been developed by R users all over the world. See the Comprehensive R Archive Network (CRAN)
Not efficient to have all these packages installed every time you use R. Install only the packages you want to use.
Use sessionInfo() to see which packages are active.
This is how the basic installation looks like:
An overview of the packages you have installed, see the tab “Packages” in the output pane:
Packages are to R what apps are on your mobile phone.
When you want to use a package for the first time, you have to install the package.
Each time you want to use the package, you have to load (activate) it.
To load a package use the following code (similar to opening an app on your phone):
To close a package use (similar to closing an app on your phone):
QuartoQuarto?The need to combine code and text and to document all the steps to make reproducible (scientific) reports of data analyses.
Quarto?It is efficient. Generate and update reports in all kinds of formats:
QuartoQuartoSee the R Markdown Cheat Sheet for a complete list of options.
Quarto
QuartoCode chunks start with {r } (for R code). You can give code chunks names (here cars).
This is how the result looks like in the rendered html document. Display of both R code and results:
You can choose to hide the R code with echo=FALSE in the chunk header:
See the Quarto reference page for a complete list of chunk options.
QuartoQuarto is the evolution of R Markdown. In RStudio you can find the Markdown Reference:
Every time you start a new (data analysis) project, make it a habit to create a new RStudio Project.
Because you want your project to work:
RStudio Projects create a convention that guarantees that the project can be moved around on your computer or onto other computers and will still “just work”. It creates relative paths (no more broken paths!).
All data, scripts, and output should be stored within the project directory.
Every time you want to work on this project: open the project by clicking the .Rproj file.
The simplest thing you could do with R is do arithmetic:
Here are the common signs to use in arithmetic:
| arithmetic | sign |
|---|---|
| Addition | + |
| Subtraction | - |
| Multiplication | * |
| Division | / |
| Exponents. | ^ or ** |
In reading materials you have learned about the <- assignment operator.
Here x is assigned the value 8
If you run this code:
x under “Values” followed by 8Assigning does not print the value 8.
If you want to print to value 8 you can do:
When you assign values with the assignment operator <- you create an R object.
Objects can contain data, functions or even other objects.
The most commonly used objects are:
A vector is a list of values (data). The simplest object in R is a vector with one element:
The function c(...) collects elements in a vector
seq(from, to) or : generate a sequence of integersrep(..., times) repeats ... a number of timesVectors (and othe R objects) can contain different data types (classes)
Numeric
Character
Logical data can take only one of two values: TRUE or FALSE.
matrix(data, nrow, ncol) generates a matrix
data.frame(...) collects vectors as variables in a data frame
list(...) creates a list
factor(...) makes / changes vector into factor
names()Use names() to assign names to elements in R objects.
For example to the elements of a list:
When to use the R data objects?
| Object | Use | Why |
|---|---|---|
| data frame | statistical analysis | can store variables of any class |
| model formula | statistical models, plots | concise and readable, flexible, consistent across functions, packages |
| lists | storage of output | can store any object of any class |
| vectors/matrices | programming | can do fast calculations |
snake_case: words are separated by underscores (_), and all letters are typically in lowercase. Examples: data_analysis.RData, my_data.csv.
camelCase:: each word within a compound word is capitalized, except for the first word, and no spaces or underscores are used to separate the words. Examples: calculateMean, summaryStatistics.
PascalCase: the first letter of each word in a compound word is capitalized, and no spaces or underscores are used to separate the words. Examples: DataAnalysis, DescriptiveStatistics.
RStudio does this for you!=, +, -, <-). Use x <- 5 not x<-5Exception: spaces around = are optional when passing parameters in a function call.
or
c(1, 2, 3)sum(a = 1, b = 2)Bad examples:
Comments
Gerko Vink @ Anton de Kom Universiteit, Paramaribo